Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach
نویسندگان
چکیده
منابع مشابه
Learning Decision Theoretic Utilities through Reinforcement Learning
Probability models can be used to predict outcomes and compensate for missing data, but even a perfect model cannot be used to make decisions unless the utility of the outcomes, or preferences between them, are also provided. This arises in many real-world problems, such as medical diagnosis, where the cost of the test as well as the expected improvement in the outcome must be considered. Relat...
متن کاملA Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit t...
متن کاملA Reinforcement Learning Approach to Online Learning of Decision Trees
Online decision tree learning algorithms typically examine all features of a new data point to update model parameters. We propose a novel alternative, Reinforcement Learningbased Decision Trees (RLDT), that uses Reinforcement Learning (RL) to actively examine a minimal number of features of a data point to classify it with high accuracy. Furthermore, RLDT optimizes a long term return, providin...
متن کاملReinforcement Learning in Distributed Domains: An Inverse Game Theoretic Approach
We consider the design of multi-agent systems (MAS) so as to optimize an overall world utility function when each agent in the system runs a Reinforcement Learning (RL) algorithm based on own its private utility function. Traditional game theory deals with the "forward problem" of determining the state of a MAS that will ensue from a specified set of private utilities of the individual agents. ...
متن کاملTo Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning
Most work in reinforcement learning (RL) is based on discounted techniques, such as Q learning, where long-term rewards are geometrically attenuated based on the delay in their occurence. Schwartz recently proposed an undiscounted RL technique called R learning that optimizes average reward, and argued that it was a better metric than the discounted one optimized by Q learning. In this paper we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2019
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v33i01.33017949